Voice evaluation is a systematic process encompassing a range of perceptual, acoustic, aerodynamic, and visual methodologies employed to assess the function, quality, and characteristics of human voice production. This foundational concept and practice is critical in clinical diagnosis, research, and training related to vocal health, disorders, and performance.
Ontological type
Core Components
Assessment Instruments
Clinical Indications
Objective Quantitative Foundations
1972 - 1985
Perceptual and Statistical Integration
1986 - 2008
Multimodal Cross-domain Standardization
2009 - 2022
Objective Quantitative Foundations era
James L. Flanagan [1], affiliated with Massachusetts Institute of Technology [3] and Johns Hopkins University [4] in this era, helped establish objective foundations for voice evaluation. His 1972 work, Synthesis of Voiced Sounds From a Two-Mass Model of the Vocal Cords [6], introduced a physical model of voice production that underpinned quantitative acoustic analysis. Joan Kwiatkowski [2], affiliated with University of Wisconsin–Madison [5], contributed to this era by developing a Procedure for Phonetic Transcription by Consensus [7]. This consensus-based transcription approach [7] advanced the reliability of perceptual ratings and linked them to instrumented assessment practices and normative benchmarking.
Perceptual and Statistical Integration era
Richard C. Rose [1] was active across Massachusetts Institute of Technology [3] and Emory University [4] during this era. His key contribution was Robust text-independent speaker identification using Gaussian mixture speaker models [6], a method that addressed variability across texts and enabled more reproducible cross-center comparisons. D.A. Reynolds [2] was active at Massachusetts Institute of Technology [3] and Georgia Institute of Technology [5] during this era. His key contribution was Robust text-independent speaker identification using Gaussian mixture speaker models [6], demonstrating how Gaussian mixture modeling supported reliable, text-independent identity verification and contributing to standardized benchmarking in voice evaluation.
Multimodal Cross-domain Standardization era
Jennifer Oates [1] is a prominent figure in multimodal voice evaluation during the 2009–2022 era, with affiliations at La Trobe University [3] and Comenius University Bratislava [4]. Her 2009 Auditory-Perceptual Evaluation of Disordered Voice Quality [7] helped establish standardized perceptual assessment methods, a foundational step that facilitated later cross-domain, data-driven voice quality frameworks in real-world settings. Daryush D. Mehta [2] is a notable contributor in this era, affiliated with the Harvard–MIT Division of Health Sciences and Technology [5] and Harvard University [6]. His Mobile Voice Health Monitoring Using a Wearable Accelerometer Sensor and a Smartphone Platform [8] and the Duration of ambulatory monitoring needed to accurately estimate voice use [9] work advanced real-world voice-health assessment, enabling long-term monitoring and robust voice use estimation in variable listening environments.